Well-Defined Benchmarking Method For Second Generation Read Mapping”

نویسندگان

  • Manuel Holtgrewe
  • Anne-Katrin Emde
  • David Weese
  • Knut Reinert
چکیده

We note that each match with distance ≤ k − 2 implies at least one match on both sides of it. Figure 1 in the main article shows an example. This can also be seen in Figure 2 in the main article. For k = 5, the third end position of the third lower branch in the left tree implies feasible matches left and right of it. This problem is partially solved by the definition of neighbour equivalence in Section 2.4 of the main article. However, there is a problem when merging matches in this way: For k = 4, the end position marked with ? (at the fourth lower branch of the left tree) separates the matches left and right of it. However, in Section 2.4 of the main article, we explained that alignments sharing their trace are basically the same. Thus, it is desirable to merge the matches left and right of such separating positions. This is the reasoning behind defining trace equivalence and combining it into ≡.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Next-generation sequencing algorithms: from read mapping to variant detection

Next-Generation-Sequencing (NGS) has brought on a revolution in sequence analysis with its broad spectrum of applications ranging from genome resequencing to transcriptomics or metagenomics, and from fundamental research to diagnostics. The tremendous amounts of data necessitate highly efficient computational analysis tools for the wide variety of NGS applications. This thesis addresses a broad...

متن کامل

The mapping task and its various applications in next-generation sequencing

The aim of this thesis is the development and benchmarking of computational methods for the analysis of high-throughput data from tiling arrays and next-generation sequencing. Tiling arrays have been a mainstay of genome-wide transcriptomics, e.g., in the identification of functional elements in the human genome. Due to limitations of existing methods for the data analysis of this data, a novel...

متن کامل

Next generation sequencing data of a defined microbial mock community

Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here repo...

متن کامل

MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping

MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide cons...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011